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IN THE CLAIMS 

Please cancel Claims 1, 3-41 and 43 without prejudice, and add new Claims 44-135, as 
follows: 

5 

1. (Cancelled) 

2. (Previously presented) For use in a system involving an Embedded-DRAM 
processor, a method for intelligent caching comprising the steps of: 

splitting an architecture into first and second portions, said first portion comprising a set 
10 of functional units and a set of architectural registers exercised thereby, said second portion 
comprising at least one functional unit capable of moving data between a main memory 
implemented as one or more banks of DRAM without a caching system that employs cache hits 
and cache misses, and said set of architectural registers; and 

splitting a single program into first and second concurrently executing portions which 
15 each concurrently execute distinct subsets of parallely dispatched instructions from one or more 
instruction streams, said first portion of said program executed on said first portion of the 
architecture, said second portion of said program executed on said second portion of said 
architecture; 

wherein said second portion of said architecture is operative to prefetch data from said 
20 main memory into said architectural registers prior to being processed by said first portion of said 
architecture, and wherein said second portion of said architecture is operative to move results 
produced by said first portion of said architecture into main memory after they are produced by 
said first portion of said architecture; and 

prior to when said first portion of said architecture executes a conditional branch 
25 instruction, said second portion of said architecture prefetches first and second data sets from 
memory into said architectural registers, said first data set being needed for use as instruction 
operands when said condition evaluates to true, said second data set being needed for use as 
instruction operands when said condition evaluates to false. 

3. -41. (Cancelled) 

30 42. (Previously presented) The method of Claim 2, wherein speculative prefetching of 

data is performed from said main memory so that the first program need not wait for the first or 
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the second data set to be fetched from said main memory, irrespective of the outcome of the 
conditional instruction. 

43. (Cancelled) 

44. (New) The method of Claim 2, wherein the second portion generates a row 

5 precharge instruction to precharge a DRAM row of a bank of the one or more banks of DRAM to 
cause data to be ready prior to issuing a read command. 

45. (New) The method of Claim 2, wherein the set of architectural registers 
comprises a register file comprising a plurality of registers and having a parallel access port 
operative to load or store, under control of the second portion, contents of said register file in a 

1 0 single DRAM access cycle from or to a DRAM row of a bank of the one or more banks of 



46. (New) The method of Claim 45, wherein the load operation is performed with a mask 
to allow certain of the contents of selected registers of the register file not to be modified by the 
load operation. 



second access port operative to transfer data between one or more of the functional units of the 
first portion. 

48. (New) The method of Claim 45, wherein the first portion and the second portion of 
said architecture cooperate to execute a DRAM row selected by a row-address register, and said 

20 register file further comprises at least a second access port operative to transfer data between one 
or more of the functional units of the first portion. 

49. (New) The method of Claim 45, wherein the register file can be placed into an 
inactive state where the register file does not appear in the register space of the functional units 
of the first portion. 

25 50. (New) The method of Claim 45, wherein when the register file is placed into the 

inactive state, the second portion is enabled to cause a parallel load or store operation to occur 
between the parallel access port and a row of DRAM in a bank of the one or more banks of 



DRAM. 



15 



47. (New) The method of Claim 45, wherein said register file further comprises at least a 



DRAM. 
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51. (New) The method of Claim 2, wherein the first and second portions of said 
architecture cooperatively execute instructions to process image data for display. 

52. (New) The method of Claim 2, wherein the first and second portions of said 
architecture cooperatively execute instructions to process video data for display. 

5 53. (New) The method of Claim 2 5 wherein the first and second portions of said 

architecture cooperatively execute instructions to perform digital filtering operations. 

54. (New) The method of Claim 2, wherein the first and second portions of said 
architecture cooperatively execute instructions to execute a video decoder algorithm. 

55. (New) For use in a system involving an Embedded-DRAM processor, a method 
10 for intelligent caching comprising: 

providing an architecture split into first and second portions, said first portion comprising 
a set of functional units and a set of architectural registers exercised thereby, said second portion 
comprising at least one functional unit capable of moving data between a main memory 
implemented as one or more banks of DRAM, and said set of architectural registers, wherein the 
15 second portion accesses main memory without a caching system that employs cache hits and 
cache misses; and 

providing a single program split into first and second concurrently executing portions 
which each concurrently execute distinct subsets of parallely dispatched instructions from one or 
more instruction streams, said first portion of said program executed on said first portion of the 
20 architecture, said second portion of said program executed on said second portion of said 
architecture; 

wherein said second portion of said architecture is operative to prefetch data from said 
main memory into one or more of said architectural registers prior to being processed by said 
first portion of said architecture, and wherein said second portion of said architecture is operative 

25 to move results produced by said first portion of said architecture into main memory after they 
are produced by said first portion of said architecture; and 

prior to when said first portion of said architecture executes a conditional branch 
instruction, said second portion of said architecture prefetches first and second data sets from 
said main memory into said architectural registers, said first data set being needed for use as 

30 instruction operands when said condition evaluates to true, said second data set being needed for 
use as instruction operands when said condition evaluates to false. 
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56. (New) The method of Claim 55, wherein speculative prefetching of data is 
performed from said main memory so that the first program need not wait for the first or the 
second data set to be fetched from said main memory, irrespective of the outcome of the 
conditional instruction. 

5 57. (New) The method of Claim 55, wherein the second portion generates a row 

precharge instruction to precharge a DRAM row of a bank of the one or more banks of DRAM to 
cause data to be ready prior to issuing a read command. 

58. (New) The method of Claim 55, wherein the set of architectural registers 
comprises a register file comprising a plurality of registers and having a parallel access port 

10 operative to load or store, under control of the second portion, contents of said register file in a 
single DRAM access cycle from or to a DRAM row of a bank of the one or more banks of 
DRAM. 

59. (New) The method of Claim 58, wherein the load operation is performed with a mask 
to allow certain of the contents of selected registers of the register file not to be modified by the 

1 5 load operation. 

60. (New) The method of Claim 58, wherein said register file further comprises at least a 
second access port operative to transfer data between one or more of the functional units of the 
first portion. 

61. (New) The method of Claim 58, wherein the first portion and the second portion of 
20 said architecture cooperate to execute a DRAM row selected by a row-address register, and said 

register file further comprises at least a second access port operative to transfer data between one 
or more of the functional units of the first portion. 

62. (New) The method of Claim 58, wherein the register file can be placed into an 
inactive state where the register file does not appear in the register space of the functional units 

25 of the first portion. 

63. (New) The method of Claim 58, wherein when the register file is placed into the 
inactive state, the second portion is enabled to cause a parallel load or store operation to occur 
between the parallel access port and a row of DRAM in a bank of the one or more banks of 
DRAM. 
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64. (New) The method of Claim 55, wherein the first and second portions of said 
architecture cooperatively execute instructions to process at least one of image data and video 
data, for display. 

65. (New) The method of Claim 55, wherein the first and second portions cooperatively 
5 execute instructions to perform digital filtering operations. 

66. (New) The method of Claim 55, wherein when the first and second portions of said 
architecture cooperatively execute instructions to execute a video decoder algorithm. 

67. (New) In an embedded-DRAM processor, a method for intelligent caching, 
comprising: 

10 providing an architecture comprising first and second portions, said first portion 

comprising a set of functional units and a set of architectural registers exercised thereby, said 
second portion comprising at least one functional unit capable of moving data between a main 
memory implemented substantially as one or more banks of DRAM, and said set of architectural 
registers, wherein the second portion accesses main memory without a caching system that 

1 5 employs cache hits and cache misses; and 

providing a program comprising first and second program portions which each 
concurrently execute subsets of parallely dispatched instructions from one or more instruction 
streams, said first program portion executed on said first portion of the architecture, said second 
program portion executed on said second portion of said architecture; 

20 wherein said second portion of said architecture is operative to prefetch data from said 

main memory into one or more of said architectural registers prior to being processed by said 
first portion of said architecture, and wherein said second portion of said architecture is operative 
to move results produced by said first portion of said architecture into main memory after they 
are produced by said first portion of said architecture; and 

25 prior to when said first portion of said architecture executes a conditional instruction, said 

second portion of said architecture prefetches first and second data sets from said main memory 
into said architectural registers, said first data set being needed for use as instruction operands 
when said condition is true, said second data set being needed for use as instruction operands 
when said condition is false. 

30 68. (New) The method of Claim 67, wherein speculative prefetching of data is 

performed from said main memory so that the first program portion need not wait for the first or 
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the second data set to be fetched from said main memory, irrespective of the outcome of the 
conditional instruction. 

69. (New) The method of Claim 67, wherein the second portion generates a row 
precharge instruction to precharge a DRAM row of a bank of the one or more banks of DRAM to 

5 cause data to be ready prior to issuing a read command. 

70. (New) The method of Claim 67, wherein the set of architectural registers 
comprises a register file comprising a plurality of registers and having a parallel access port 
operative to load or store, under control of the second portion, contents of said register file in a 
single DRAM access cycle from or to a DRAM row of a bank of the one or more banks of 

10 DRAM. 

71. (New) The method of Claim 70, wherein the load operation is performed with a mask 
to allow certain of the contents of selected registers of the register file not to be modified by the 
load operation. 

72. (New) The method of Claim 70, wherein said register file further comprises at least a 
15 second access port operative to transfer data between one or more of the functional units of the 

first portion. 

73. (New) The method of Claim 70, wherein the first portion and the second portion of 
said architecture cooperate to execute a DRAM row selected by a row-address register, and said 
register file further comprises at least a second access port operative to transfer data between one 

20 or more of the functional units of the first portion. 

74. (New) The method of Claim 70, wherein the register file can be placed into an 
inactive state where the register file does not appear in the register space of the functional units 
of the first portion. 



25 inactive state, the second portion is enabled to cause a parallel load or store operation to occur 
between the parallel access port and a row of DRAM in a bank of the one or more banks of 



75. (New) The method of Claim 70, wherein when the register file is placed into the 



DRAM. 
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76. (New) The method of Claim 67, wherein the first and second portions of said 
architecture cooperatively execute instructions to process at least one of image data or video data 
for display. 

77. (New) The method of Claim 67, wherein the first and second portions of said 
5 architecture cooperatively execute instructions to perform digital filtering operations. 

78. (New) The method of Claim 67, wherein when the first and second portions of said 
architecture cooperatively execute instructions to execute a video decoder algorithm. 

79. (New) In an Embedded-RAM processing apparatus, a method for intelligent 
caching comprising: 

10 utilizing an architecture comprising first and second portions, said first portion 

comprising a set of functional units and a set of architectural registers exercised thereby, said 
second portion comprising at least one functional unit capable of moving data between a main 
memory implemented as one or more banks of RAM, and said set of architectural registers, 
wherein the second portion accesses main memory without a caching system that employs cache 

15 hits and cache misses; and 

utilizing a single program having first and second concurrently executing portions to each 
concurrently execute distinct subsets of parallely dispatched instructions from one or more 
instruction streams, said first portion of said program executed on said first portion of the 
architecture, said second portion of said program executed on said second portion of said 

20 architecture; 

wherein said second portion of said architecture is operative to fetch data in said main 
memory prior to being loaded into said first portion of said architecture, and wherein said second 
portion of said architecture is operative to move results produced by said first portion of said 
architecture into main memory after they are produced by said first portion of said architecture; 
25 and 

prior to when said first portion of said architecture executes a conditional instruction, said 
second portion of said architecture precharges first and second data sets in respective RAM rows, 
said first data set being needed for use as instruction operands when said condition is true, said 
second data set being needed for use as instruction operands when said condition is false. 
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80. (New) The method of Claim 79, wherein the second portion of said architecture 
generates a row precharge instruction to precharge a RAM row of a bank of the one or more 
banks of RAM to cause data to be ready prior to issuing a read command. 

81. (New) The method of Claim 79, wherein the set of architectural registers 

5 comprises a register file comprising a plurality of registers and having a parallel access port 
operative to load or store, under control of the second portion, contents of said register file in a 
single RAM access cycle from or to a RAM row of a bank of the one or more banks of RAM. 

82. (New) The method of Claim 81, wherein at least one load operation is performed with 
a mask to allow certain of the contents of selected registers of the register file not to be modified 

10 by the at least one load operation. 

83. (New) The method of Claim 81, wherein said register file further comprises at least a 
second access port operative to transfer data between one or more of the functional units of the 
first portion. 

84. (New) The method of Claim 81, wherein the first portion and the second portion of 
1 5 said architecture cooperate to execute a RAM row selected by a row-address register, and said 

register file further comprises at least a second access port operative to transfer data between one 
or more of the functional units of the first portion. 

85. (New) The method of Claim 81, wherein the register file can be placed into an 
inactive state where the register file does not appear in the register space of the functional units 

20 of the first portion. 

86. (New) The method of Claim 8 1 , wherein when the register file is placed into the 
inactive state, the second portion is enabled to cause a parallel load or store operation to occur 
between the parallel access port and a row of RAM in a bank of the one or more banks of RAM. 

87. (New) The method of Claim 79, wherein the first and second portions of said 

25 architecture cooperatively execute instructions to process at least one of image data or video data 
for display. 

88. (New) The method of Claim 79, wherein the first and second portions of said 
architecture cooperatively execute instructions to perform digital filtering operations. 

89. (New) The method of Claim 79, wherein when the first and second portions of said 

30 architecture cooperatively execute instructions to execute a video decoder algorithm. 
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90. (New) The method of Claim 79, wherein said RAM comprises scroll-RAM. 

91. (New) The method of Claim 79, wherein said RAM comprises synchronous DRAM 
(SDRAM). 

92. (New) In an Embedded-RAM processing apparatus, a method for intelligent 
5 caching comprising: 

providing an architecture comprising first and second portions, said first portion 
comprising a set of functional units and a set of architectural registers exercised thereby, said 
second portion comprising at least one functional unit capable of moving data between a main 
memory implemented as one or more banks of RAM, and said set of architectural registers, 

10 wherein the second portion accesses main memory without a caching system that employs cache 
hits and cache misses; and 

utilizing a program comprising first and second program portions to each concurrently 
execute distinct subsets of parallely dispatched instructions from one or more instruction streams, 
said first program portion executed on said first portion of the architecture, said second program 

1 5 portion executed on said second portion of said architecture; 

wherein said second portion of said architecture is operative to prefetch data from said 
main memory and to pass it into one or more of said architectural registers prior to being 
processed by said first portion of said architecture, and wherein said second portion of said 
architecture is operative to move results produced by said first portion of said architecture into 

20 main memory after they are produced by said first portion of said architecture; and 

prior to when said first portion of said architecture executes a conditional instruction, said 
second portion of said architecture prefetches first and second data sets from said main memory 
into said architectural registers, said first data set being needed for use as instruction operands 
when said condition is true, said second data set being needed for use as instruction operands 

25 when said condition is false. 

93. (New) The method of Claim 92, wherein the second portion of said architecture 
generates a row precharge instruction to precharge a RAM row of a bank of the one or more 
banks of RAM to cause data to be ready prior to issuing a read command. 

94. (New) The method of Claim 92, wherein the set of architectural registers 
30 comprises a register file comprising a plurality of registers and having a parallel access port 
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operative to load or store, under control of the second portion, contents of said register file in a 
single RAM access cycle from or to a RAM row of a bank of the one or more banks of RAM. 

95. (New) The method of Claim 94, wherein at least one load operation is performed with 
a mask to allow certain of the contents of selected registers of the register file not to be modified 

5 by the at least one load operation. 

96. (New) The method of Claim 94, wherein said register file further comprises at least a 
second access port operative to transfer data between one or more of the functional units of the 
first portion. 

97. (New) The method of Claim 94, wherein the first portion and the second portion of 
10 said architecture cooperate to execute a RAM row selected by a row-address register, and said 

register file further comprises at least a second access port operative to transfer data between one 
or more of the functional units of the first portion. 

98. (New) The method of Claim 94, wherein the register file can be placed into an 
inactive state where the register file does not appear in the register space of the functional units 

1 5 of the first portion. 

99. (New) The method of Claim 94, wherein when the register file is placed into the 
inactive state, the second portion is enabled to cause a parallel load or store operation to occur 
between the parallel access port and a row of RAM in a bank of the one or more banks of RAM. 

100. (New) The method of Claim 92, wherein the first and second portions of said 

20 architecture cooperatively execute instructions to process at least one of image data or video data 
for display. 

101 . (New) The method of Claim 92, wherein the first and second portions of said 
architecture cooperatively execute instructions to perform digital filtering operations. 

102. (New) The method of Claim 92, wherein when the first and second portions of said 
25 architecture cooperatively execute instructions to execute a video decoder algorithm. 

103. (New) The method of Claim 92, wherein said RAM comprises scroll-RAM. 

104. (New) The method of Claim 92, wherein said RAM comprises synchronous DRAM 
(SDRAM). 

105. (New) In an Embedded-DRAM processor, a method for caching comprising: 
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providing an architecture comprising first and second portions, said first portion 
comprising a set of functional units and a set of registers exercised thereby, said second portion 
comprising at least one functional unit capable of moving data between a main memory 
implemented as one or more banks of DRAM, and said set of architectural registers, the second 
5 portion being capable of accessing main memory without a caching system that employs cache 
hits or cache misses; and 

utilizing a program comprising first and second program portions which each 
concurrently execute distinct subsets of parallely dispatched instructions from one or more 
instruction streams, said first program portion executed on said first portion of the architecture, 
10 said second program portion executed on said second portion of said architecture; 

wherein said second portion of said architecture is operative to prefetch data from said 
main memory and to pass it to one or more of said registers prior to being processed by said first 
portion of said architecture, and wherein said second portion of said architecture is operative to 
move results produced by said first portion of said architecture to frame buffer memory after they 
1 5 are produced by said first portion of said architecture; and 

prior to when said first portion of said architecture executes a conditional instruction, said 
second portion of said architecture prefetches first and second data sets from main memory into 
said registers, said first data set being needed for use as instruction operands when said condition 
is true, said second data set being needed for use as instruction operands when said condition is 



106. (New) The method of Claim 105, wherein the second portion of said architecture 
generates a row precharge instruction to precharge a DRAM row of a bank of the one or more 
banks of DRAM to cause data to be ready prior to issuing a read command. 

107. (New) The method of Claim 105, wherein the set of architectural registers 
25 comprises a register file comprising a plurality of registers and having a parallel access port 

operative to load or store, under control of the second portion, contents of said register file in a 
single DRAM access cycle from or to a DRAM row of a bank of the one or more banks of 



20 



false. 



DRAM. 
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108. (New) The method of Claim 107, wherein at least one load operation is performed 
with a mask to allow certain of the contents of selected registers of the register file not to be 
modified by the at least one load operation. 

109. (New) The method of Claim 107, wherein said register file further comprises at least . 
5 a second access port operative to transfer data between one or more of the functional units of the 

first portion. 

1 10. (New) The method of Claim 107, wherein the first portion and the second portion of 
said architecture cooperate to execute a RAM row selected by a row-address register, and said 
register file further comprises at least a second access port operative to transfer data between one 

1 0 or more of the functional units of the first portion. 

111. (New) The method of Claim 107, wherein the register file can be placed into an 
inactive state where the register file does not appear in the register space of the functional units 
of the first portion. 

112. (New) The method of Claim 107, wherein when the register file is placed into the 
15 inactive state, the second portion is enabled to cause a parallel load or store operation to occur 

between the parallel access port and a row of RAM in a bank of the one or more banks of RAM. 

113. (New) The method of Claim 105, wherein the first and second portions of said 
architecture cooperatively execute instructions to process at least one of image data or video data 
for display. 

20 114. (New) The method of Claim 105, wherein the first and second portions of said 

architecture cooperatively execute instructions to perform digital filtering operations. 

115. (New) The method of Claim 105, wherein when the first and second portions of said 
architecture cooperatively execute instructions to execute a video decoder algorithm. 

1 1 6. (New) For use in a system involving an Embedded-DRAM processor, a method 
25 for caching comprising: 

providing an architecture comprising first and second portions, said first portion 
comprising a set of functional units and a set of architectural registers exercised thereby, said 
second portion comprising at least one functional unit capable of moving data between a main 
memory implemented as one or more banks of DRAM, and said set of architectural registers, the 
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second portion accessing main memory without a caching system that employs cache hits and 
cache misses; and 

providing a program comprising first and second program portions concurrently 
executing respective ones of distinct subsets of parallely dispatched instructions from one or 
5 more instruction streams, said first program portion executing on said first portion of the 

architecture, said second program portion executing on said second portion of said architecture; 

wherein said second portion of said architecture is operative to prefetch data from said 
main memory and to pass it into one or more of said architectural registers prior to processing by 
said first portion of said architecture, and wherein said second portion of said architecture is 
10 operative to move results produced by said first portion of said architecture into a frame buffer 
memory after they are produced by said first portion of said architecture; and 

prior to when said first portion of said architecture executes an instruction, said second 
portion of said architecture prefetching first and second data sets from main memory into said 
architectural registers, said first and second data sets being needed for use as instruction 
15 operands. 

1 1 7. (New) The method of Claim 1 16, wherein the second portion of said architecture 
generates a row precharge instruction to precharge a DRAM row of a bank of the one or more 
banks of DRAM to cause data to be ready prior to issuing a read command. 

1 18. (New) The method of Claim 1 16, wherein the set of architectural registers 
20 comprises a register file comprising a plurality of registers and having a parallel access port 

operative to load or store, under control of the second portion, contents of said register file in a 
single DRAM access cycle from or to a DRAM row of a bank of the one or more banks of 
DRAM. 

1 19. (New) The method of Claim 118, wherein at least one load operation is performed 
25 with a mask to allow certain of the contents of selected registers of the register file not to be 

modified by the at least one load operation. 

120. (New) The method of Claim 118, wherein said register file further comprises at least 
a second access port operative to transfer data between one or more of the functional units of the 
first portion. 
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121. (New) The method of Claim 1 18, wherein the first portion and the second portion of 
said architecture cooperate to execute a RAM row selected by a row-address register, and said 
register file further comprises at least a second access port operative to transfer data between one 
or more of the functional units of the first portion. 

5 122. (New) The method of Claim 118, wherein the register file can be placed into an 

inactive state where the register file does not appear in the register space of the functional units 
of the first portion. 

123. (New) The method of Claim 118, wherein when the register file is placed into the 
inactive state, the second portion is enabled to cause a parallel load or store operation to occur 

1 0 between the parallel access port and a row of RAM in a bank of the one or more banks of RAM. 

124. (New) The method of Claim 116, wherein the first and second portions of said 
architecture cooperatively execute instructions to process at least one of image data or video data 
for display. 

125. (New) The method of Claim 116, wherein the first and second portions of said 
15 architecture cooperatively execute instructions to perform digital filtering operations. 

126. (New) The method of Claim 116, wherein when the first and second portions of said 
architecture cooperatively execute instructions to execute a video decoder algorithm. 

127. (New) For use in a system involving an Embedded-DRAM processor, said system 
comprising (i) an architecture comprising first and second portions, said first portion comprising 

20 a set of functional units and a set of architectural registers exercised thereby, said second portion 
comprising at least one functional unit capable of moving data between a main memory 
implemented as one or more banks of DRAM, and said set of architectural registers, wherein the 
second portion accesses main memory without a caching system that employs cache hits and 
cache misses, and (ii) a program comprising first and second program portions which each 

25 concurrently execute distinct subsets of parallely dispatched instructions from one or more 

instruction streams, said first program portion executed on said first portion of the architecture, 
said second program portion executed on said second portion of said architecture, a method for 
intelligent caching comprising: 

prefetching, using at least said second portion of said architecture, data from said main 

30 memory into one or more of said architectural registers prior to being processed by said first 
portion of said architecture, and wherein said second portion of said architecture is operative to 
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* » 

move results produced by said first portion of said architecture into main memory after they are 
produced by said first portion of said architecture; and 

prefetching, using at least said second portion of said architecture and prior to when said 
first portion of said architecture executes a conditional branch instruction, first and second data 
5 sets from said main memory into said architectural registers, said first data set being needed for 
use as instruction operands when said condition is true, said second data set being needed for use 
as instruction operands when said condition is false. 

128. (New) For use in a system involving an Embedded-RAM processor, said system 
comprising (i) an architecture comprising first and second portions, said first portion comprising 

10 a set of functional units and a set of architectural registers exercised thereby, said second portion 
comprising at least one functional unit capable of moving data between a main memory 
implemented as one or more banks of RAM, and said set of architectural registers, wherein the 
second portion accesses main memory without a caching system that employs cache hits and 
cache misses, and (ii) a program comprising first and second program portions which each 

15 concurrently execute distinct subsets of parallely dispatched instructions from one or more 

instruction streams, said first program portion executed on said first portion of the architecture, 
said second program portion executed on said second portion of said architecture, a method for 
intelligent caching comprising: 

fetching, using at least said second portion of said architecture, data in said main memory 

20 prior to being loaded into said first portion of said architecture, and wherein said second portion 
of said architecture is operative to move results produced by said first portion of said architecture 
into main memory after they are produced by said first portion of said architecture; and 

precharging, using at least said second portion of said architecture, and prior to when said 
first portion of said architecture executes a conditional instruction, first and second data sets in 

25 respective RAM rows, said first data set being needed for use as instruction operands when said 
condition is true, said second data set being needed for use as instruction operands when said 
condition is false. 

1 29. (New) In an Embedded-RAM processing apparatus comprising (i) an architecture 
comprising first and second portions, said first portion comprising a set of functional units and a 

30 set of architectural registers exercised thereby, said second portion comprising at least one 
functional unit capable of moving data between a main memory implemented as one or more 
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banks of RAM, and said set of architectural registers, wherein the second portion accesses main 
memory without a caching system that employs cache hits and cache misses, and (ii) a program 
comprising first and second program portions which concurrently execute distinct subsets of 
parallely dispatched instructions from one or more instruction streams, said first program portion 
5 executed on said first portion of the architecture, said second program portion executed on said 
second portion of said architecture, a method for intelligent caching comprising: 

fetching, using at least said second portion of said architecture, data in said main memory 
prior to being passed to one or more of said architectural registers prior to being processed by 
said first portion of said architecture, and wherein said second portion of said architecture is 

10 operative to move results produced by said first portion of said architecture into said main 
memory after they are produced by said first portion of said architecture; and 

precharging, using at least said second portion of said architecture, and prior to when said 
first portion of said architecture executes a conditional instruction, from said main memory into 
said architectural registers, first and second data sets, said first data set being needed for use as 

1 5 instruction operands when said condition is true, said second data set being needed for use as 
instruction operands when said condition is false. 

130. (New) In an Embedded-DRAM processing apparatus comprising (i) an 
architecture split into first and second portions, said first portion comprising a set of functional 
units and a set of architectural registers exercised thereby, said second portion comprising at least 

20 one functional unit capable of moving data between a main memory implemented as one or more 
banks of DRAM, and said set of architectural registers, wherein the second portion accesses main 
memory without a caching system that employs cache hits and cache misses, and (ii) a program 
comprising first and second program portions which concurrently execute distinct subsets of 
parallely dispatched instructions from one or more instruction streams, said first program portion 

25 executed on said first portion of the architecture, said second program portion executed on said 
second portion of said architecture, a method for intelligent caching comprising: 

fetching, using at least said second portion of said architecture, data in said main memory 
prior to being passed to one or more of said architectural registers prior to being processed by 
said first portion of said architecture, and wherein said second portion of said architecture is 

30 operative to move results produced by said first portion of said architecture into a frame buffer 
memory after they are produced by said first portion of said architecture; and 
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charging, using at least said second portion of said architecture, and prior to when said 
first portion of said architecture executes a conditional instruction, from said main memory into 
said architectural registers, first and second data sets, said first and second data sets being needed 
for use as instruction operands. 

5 131. (New) In an Embedded-DRAM processing apparatus comprising (i) an 

architecture comprising first and second portions, said first portion comprising a set of functional 
units and a set of architectural registers exercised thereby, said second portion comprising at least 
one functional unit capable of moving data between a main memory implemented as one or more 
banks of DRAM, and said set of architectural registers, wherein the second portion accesses main 

10 memory without a caching system that employs cache hits and cache misses, and (ii) a program 
comprising first and second program portions which concurrently execute distinct subsets of 
parallely dispatched instructions from one or more instruction streams, said first program portion 
executed on said first portion of the architecture, said second program portion executed on said 
second portion of said architecture, a method for intelligent caching comprising: 

15 fetching, using at least said second portion of said architecture, data in said main memory 

prior to being passed to one or more of said architectural registers prior to being processed by 
said first portion of said architecture, and wherein said second portion of said architecture is 
operative to move results produced by said first portion of said architecture into said main 
memory after they are produced by said first portion of said architecture; and 

20 charging, using at least said second portion of said architecture, and prior to when said 

first portion of said architecture executes a conditional instruction, in respective DRAM rows, 
first and second data sets, said first data set being needed for use as instruction operands when 
said condition is true, said second data set being needed for use as instruction operands when 
said condition is false. 

25 132. (New) In an Embedded-DRAM processing apparatus comprising (i) an 

architecture comprising first and second portions, said first portion comprising a set of functional 
units and a set of architectural registers exercised thereby, said second portion comprising at least 
one functional unit capable of moving data between a main memory implemented as one or more 
banks of DRAM, and said set of registers, wherein the second portion accesses main memory 

30 without a caching system that employs cache hits and cache misses, and (ii) a program 

comprising first and second program portions which concurrently execute distinct subsets of 
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parallely dispatched instructions from one or more instruction streams, said first program portion 
executed on said first portion of the architecture, said second program portion executed on said 
second portion of said architecture, a method for intelligent caching comprising: 

fetching, using at least said second portion of said architecture, data in said main memory 
5 prior to being passed to one or more of said registers prior to being processed by said first portion 
of said architecture, and wherein said second portion of said architecture is operative to move 
results produced by said first portion of said architecture into said main memory after they are 
produced by said first portion of said architecture; and 

prefetching, using at least said second portion of said architecture, and prior to when said 
10 first portion of said architecture executes a conditional instruction, in respective DRAM rows, 
first and second data sets into said registers, said first and second data sets being needed for use 
as operands. 

133. (New) For use in an Embedded-DRAM processing apparatus (i) first architecture 
means and second architecture means, said first means comprising a set of functional means and 

15 a set of architectural register means exercised thereby, said second architecture means comprising 
at least one functional means capable of moving data between one or more banks of means for 
storing data, and said set of register means, wherein the second architectural means accesses said 
means for storing data without a caching system that employs cache hits and cache misses, and 
(ii) a program comprising first and second program portions which each concurrently execute 

20 distinct subsets of parallely dispatched instructions from one or more instruction streams, said 
first program portion executed on said first architecture means, said second program portion 
executed on said second architecture means, a method for caching comprising: 

prefetching, using at least said second architecture means, data from said means for 
storing data into one or more of said register means prior to being processed by said first 

25 architecture means, and wherein said second architecture means is operative to move results 
produced by said first architecture means into said means for storing after they are produced by 
said first architecture means; and 

prefetching, using at least said second architecture means and prior to when said first 
architecture means executes a conditional instruction, first and second data sets from said means 

30 for storing into said register means, said first data set being needed for use as operands when said 
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condition is true, said second data set being needed for use as operands when said condition is 
false. 

134. (New) In an Embedded-RAM processing apparatus comprising (i) an architecture 
comprising first and second portions, said first portion comprising a set of functional means and 

5 a set of registers exercised thereby, said second portion comprising at least one functional means 
capable of moving data between a main memory implemented as one or more banks of RAM, 
and said set of registers, wherein the second portion accesses main memory without a caching 
system that employs cache hits and cache misses, and (ii) a program comprising first and second 
program portions which concurrently execute distinct subsets of parallely dispatched instructions 

10 from one or more instruction streams, said first program portion executed on said first portion of 
the architecture, said second program portion executed on said second portion of said 
architecture, a method for caching comprising: 

fetching, using at least said second portion of said architecture, data in said main memory 
prior to being passed to one or more of said registers prior to being processed by said first portion 

1 5 of said architecture, and wherein said second portion of said architecture is operative to move 
results produced by said first portion of said architecture into said main memory after they are 
produced by said first portion of said architecture; and 

precharging first and second data sets into said registers, using at least said second portion 
of said architecture, and prior to when said first portion of said architecture executes a 

20 conditional instruction, said first data set being needed for use as operands when said condition is 
true, said second data set being needed for use as operands when said condition is false. 

135. (New) In an Embedded-RAM processing apparatus comprising (i) an architecture 
comprising first and second portions, said first portion comprising a set of functional means and 
a set of registers exercised thereby, said second portion comprising at least one functional means 

25 capable of moving data between a main memory implemented as one or more banks of RAM, 
and said set of registers, wherein the second portion accesses main memory without a caching 
system that employs cache hits and cache misses, and (ii) a program comprising first and second 
program portions which concurrently execute distinct subsets of parallely dispatched instructions 
from one or more instruction streams, said first program portion executed on said first portion of 

30 the architecture, said second program portion executed on said second portion of said 
architecture, a method for caching comprising: 
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fetching, using at least said second portion of said architecture, data in said main memory 
prior to being passed to one or more of said registers prior to being processed by said first portion 
of said architecture, and wherein said second portion of said architecture is operative to move 
results produced by said first portion of said architecture into a frame buffer after they are 
produced by said first portion of said architecture; and 

charging first and second data sets into said registers, using at least said second portion of 
said architecture, and prior to when said first portion of said architecture executes a conditional 
instruction, said first and second data sets being needed for use as operands. 
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